5 research outputs found
NeRSemble: Multi-view Radiance Field Reconstruction of Human Heads
We focus on reconstructing high-fidelity radiance fields of human heads,
capturing their animations over time, and synthesizing re-renderings from novel
viewpoints at arbitrary time steps. To this end, we propose a new multi-view
capture setup composed of 16 calibrated machine vision cameras that record
time-synchronized images at 7.1 MP resolution and 73 frames per second. With
our setup, we collect a new dataset of over 4700 high-resolution,
high-framerate sequences of more than 220 human heads, from which we introduce
a new human head reconstruction benchmark. The recorded sequences cover a wide
range of facial dynamics, including head motions, natural expressions,
emotions, and spoken language. In order to reconstruct high-fidelity human
heads, we propose Dynamic Neural Radiance Fields using Hash Ensembles
(NeRSemble). We represent scene dynamics by combining a deformation field and
an ensemble of 3D multi-resolution hash encodings. The deformation field allows
for precise modeling of simple scene movements, while the ensemble of hash
encodings helps to represent complex dynamics. As a result, we obtain radiance
field representations of human heads that capture motion over time and
facilitate re-rendering of arbitrary novel viewpoints. In a series of
experiments, we explore the design choices of our method and demonstrate that
our approach outperforms state-of-the-art dynamic radiance field approaches by
a significant margin.Comment: Siggraph 2023, Project Page:
https://tobias-kirschstein.github.io/nersemble/ , Video:
https://youtu.be/a-OAWqBzld
Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing
In this paper, we define and study a new Cloth2Body problem which has a goal
of generating 3D human body meshes from a 2D clothing image. Unlike the
existing human mesh recovery problem, Cloth2Body needs to address new and
emerging challenges raised by the partial observation of the input and the high
diversity of the output. Indeed, there are three specific challenges. First,
how to locate and pose human bodies into the clothes. Second, how to
effectively estimate body shapes out of various clothing types. Finally, how to
generate diverse and plausible results from a 2D clothing image. To this end,
we propose an end-to-end framework that can accurately estimate 3D body mesh
parameterized by pose and shape from a 2D clothing image. Along this line, we
first utilize Kinematics-aware Pose Estimation to estimate body pose
parameters. 3D skeleton is employed as a proxy followed by an inverse
kinematics module to boost the estimation accuracy. We additionally design an
adaptive depth trick to align the re-projected 3D mesh better with 2D clothing
image by disentangling the effects of object size and camera extrinsic. Next,
we propose Physics-informed Shape Estimation to estimate body shape parameters.
3D shape parameters are predicted based on partial body measurements estimated
from RGB image, which not only improves pixel-wise human-cloth alignment, but
also enables flexible user editing. Finally, we design Evolution-based pose
generation method, a skeleton transplanting method inspired by genetic
algorithms to generate diverse reasonable poses during inference. As shown by
experimental results on both synthetic and real-world data, the proposed
framework achieves state-of-the-art performance and can effectively recover
natural and diverse 3D body meshes from 2D images that align well with
clothing.Comment: ICCV 2023 Poste
Revisiting Event-based Video Frame Interpolation
Dynamic vision sensors or event cameras provide rich complementary
information for video frame interpolation. Existing state-of-the-art methods
follow the paradigm of combining both synthesis-based and warping networks.
However, few of those methods fully respect the intrinsic characteristics of
events streams. Given that event cameras only encode intensity changes and
polarity rather than color intensities, estimating optical flow from events is
arguably more difficult than from RGB information. We therefore propose to
incorporate RGB information in an event-guided optical flow refinement
strategy. Moreover, in light of the quasi-continuous nature of the time signals
provided by event cameras, we propose a divide-and-conquer strategy in which
event-based intermediate frame synthesis happens incrementally in multiple
simplified stages rather than in a single, long stage. Extensive experiments on
both synthetic and real-world datasets show that these modifications lead to
more reliable and realistic intermediate frame results than previous video
frame interpolation methods. Our findings underline that a careful
consideration of event characteristics such as high temporal density and
elevated noise benefits interpolation accuracy.Comment: Accepted by IROS2023 Project Site:
https://jiabenchen.github.io/revisit_even
KGDet: Keypoint-Guided Fashion Detection
Locating and classifying clothes, usually referred to as clothing detection, is a fundamental task in fashion analysis. Motivated by the strong structural characteristics of clothes, we pursue a detection method enhanced by clothing keypoints, which is a compact and effective representation of structures. To incorporate the keypoint cues into clothing detection, we design a simple yet effective Keypoint-Guided clothing Detector, named KGDet. Such a detector can fully utilize information provided by keypoints with the following two aspects: i) integrating local features around keypoints to benefit both classification and regression; ii) generating accurate bounding boxes from keypoints. To effectively incorporate local features , two alternative modules are proposed. One is a multi-column keypoint-encoding-based feature aggregation module; the other is a keypoint-selection-based feature aggregation module. With either of the above modules as a bridge, a cascade strategy is introduced to refine detection performance progressively. Thanks to the keypoints, our KGDet obtains superior performance on the DeepFashion2 dataset and the FLD dataset with high efficiency